AITopics | bilingual model

Collaborating Authors

bilingual model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

e45caa3d5273d105b8d045e748636957-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 11:38:11 GMT

InFigure 7 of this Appendix, we show that indeed this is due to a decrease in the robustness slope. Across three different datasets, MNIST, CIFAR10, NewsGroup20, we see that increasing the number of tasks leads to a decrease in the robustness slope. Experiments on other languages For our experiments on multilingual generative models, we decided to use Greek and English because we were looking for a linguistic pair with different morphology,syntaxandphonology. This ensures that any benefits in terms of robustness are not coming from exposure to more data. Asshownin Figure 8,eventhough thetwomodels arestarting from roughly thesame perplexity,thebilingual model exhibits higher structural robustness in the presence of weight deletions.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.73)
Information Technology > Artificial Intelligence > Machine Learning (0.72)

Add feedback

e45caa3d5273d105b8d045e748636957-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 11:38:08 GMT

noise, robustness, task vector, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.05)
Europe > Poland > Podlaskie Province > Bialystok (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality

Longpre, Shayne, Kudugunta, Sneha, Muennighoff, Niklas, Hsu, I-Hung, Caswell, Isaac, Pentland, Alex, Arik, Sercan, Lee, Chen-Yu, Ebrahimi, Sayna

arXiv.org Artificial IntelligenceOct-28-2025

Scaling laws research has focused overwhelmingly on English -- yet the most prominent AI models explicitly serve billions of international users. In this work, we undertake the largest multilingual scaling laws study to date, totaling 774 multilingual training experiments, spanning 10M-8B model parameters, 400+ training languages and 48 evaluation languages. We introduce the Adaptive Transfer Scaling Law (ATLAS) for both monolingual and multilingual pretraining, which outperforms existing scaling laws' out-of-sample generalization often by more than 0.3 R^2. Our analyses of the experiments shed light on multilingual learning dynamics, transfer properties between languages, and the curse of multilinguality. First, we derive a cross-lingual transfer matrix, empirically measuring mutual benefit scores between 38 x 38=1444 language pairs. Second, we derive a language-agnostic scaling law that reveals how to optimally scale model size and data when adding languages without sacrificing performance. Third, we identify the computational crossover points for when to pretrain from scratch versus finetune from multilingual checkpoints. We hope these findings provide the scientific foundation for democratizing scaling laws across languages, and enable practitioners to efficiently scale models -- beyond English-first AI.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2510.22037

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.48)

Industry: Law (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Multitasking Models are Robust to Structural Failure: A Neural Model for Bilingual Cognitive Reserve

Neural Information Processing SystemsOct-9-2025, 16:47:37 GMT

We find a surprising connection between multitask learning and robustness to neuron failures.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.05)
Europe > Poland > Podlaskie Province > Bialystok (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Leveraging Audio-Visual Data to Reduce the Multilingual Gap in Self-Supervised Speech Models

Blandón, María Andrea Cruz, Aldeneh, Zakaria, Chi, Jie, de Seyssel, Maureen

arXiv.org Artificial IntelligenceSep-23-2025

ABSTRACT Self-supervised learning (SSL) has made significant advances in speech representation learning. Models like wav2vec 2.0 and HuBERT have achieved state-of-the-art results in tasks such as speech recognition, particularly in monolingual settings. However, multilingual SSL models tend to underperform their monolingual counterparts on each individual language, especially in multilingual scenarios with few languages such as the bilingual setting. In this work, we investigate a novel approach to reduce this performance gap by introducing limited visual grounding into bilingual speech SSL models. Our results show that visual grounding benefits both monolingual and bilingual models, with especially pronounced gains for the latter, reducing the multilingual performance gap on zero-shot phonetic discrimination from 31.5% for audio-only models to 8.04% with grounding.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.17523

Country:

Asia (0.46)
Europe (0.28)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.90)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.35)

Add feedback

Assessing the Role of Data Quality in Training Bilingual Language Models

Seto, Skyler, ter Hoeve, Maartje, de Seyssel, Maureen, Grangier, David

arXiv.org Artificial IntelligenceJun-17-2025

Bilingual and multilingual language models offer a promising path toward scaling NLP systems across diverse languages and users. However, their performance often varies wildly between languages as prior works show that adding more languages can degrade performance for some languages (such as English), while improving others (typically more data constrained languages). In this work, we investigate causes of these inconsistencies by comparing bilingual and monolingual language models. Our analysis reveals that unequal data quality, not just data quantity, is a major driver of performance degradation in bilingual settings. We propose a simple yet effective data filtering strategy to select higher-quality bilingual training data with only high quality English data. Applied to French, German, and Chinese, our approach improves monolingual performance by 2-4% and reduces bilingual model performance gaps to 1%. These results highlight the overlooked importance of data quality in multilingual pretraining and offer a practical recipe for balancing performance.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.12966

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Virginia (0.04)
Europe (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

The mutual exclusivity bias of bilingual visually grounded speech models

Oneata, Dan, Nortje, Leanne, Matusevych, Yevgen, Kamper, Herman

arXiv.org Artificial IntelligenceJun-5-2025

Mutual exclusivity (ME) is a strategy where a novel word is associated with a novel object rather than a familiar one, facilitating language learning in children. Recent work has found an ME bias in a visually grounded speech (VGS) model trained on English speech with paired images. But ME has also been studied in bilingual children, who may employ it less due to cross-lingual ambiguity. We explore this pattern computationally using bilingual VGS models trained on combinations of English, French, and Dutch. We find that bilingual models generally exhibit a weaker ME bias than monolingual models, though exceptions exist. Analyses show that the combined visual embeddings of bilingual models have a smaller variance for familiar data, partly explaining the increase in confusion between novel and familiar concepts. We also provide new insights into why the ME bias exists in VGS models in the first place. Code and data: https://github.com/danoneata/me-vgs

artificial intelligence, bilingual model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2506.04037

Country:

North America (0.28)
Europe (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

On the Acquisition of Shared Grammatical Representations in Bilingual Language Models

Arnett, Catherine, Chang, Tyler A., Michaelov, James A., Bergen, Benjamin K.

arXiv.org Artificial IntelligenceMar-5-2025

While crosslingual transfer is crucial to contemporary language models' multilingual capabilities, how it occurs is not well understood. In this paper, we ask what happens to a monolingual language model when it begins to be trained on a second language. Specifically, we train small bilingual models for which we control the amount of data for each language and the order of language exposure. To find evidence of shared multilingual representations, we turn to structural priming, a method used to study grammatical representations in humans. We first replicate previous crosslingual structural priming results and find that after controlling for training data quantity and language exposure, there are asymmetrical effects across language pairs and directions. We argue that this asymmetry may shape hypotheses about human structural priming effects. We also find that structural priming effects are less robust for less similar language pairs, highlighting potential limitations of crosslingual transfer learning and shared representations for typologically diverse languages.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.03962

Country:

North America > United States > Florida > Miami-Dade County > Miami (0.04)
North America > Dominican Republic (0.04)
Asia > China > Hong Kong (0.04)
(8 more...)

Genre: Research Report > New Finding (0.93)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings

Mohr, Isabelle, Krimmel, Markus, Sturua, Saba, Akram, Mohammad Kalim, Koukounas, Andreas, Günther, Michael, Mastrapas, Georgios, Ravishankar, Vinit, Martínez, Joan Fontanals, Wang, Feng, Liu, Qi, Yu, Ziniu, Fu, Jie, Ognawala, Saahil, Guzman, Susana, Wang, Bo, Werk, Maximilian, Wang, Nan, Xiao, Han

arXiv.org Artificial IntelligenceFeb-26-2024

We introduce a novel suite of state-of-the-art bilingual text embedding models that are designed to support English and another target language. These models are capable of processing lengthy text inputs with up to 8192 tokens, making them highly versatile for a range of natural language processing tasks such as text retrieval, clustering, and semantic textual similarity (STS) calculations. By focusing on bilingual models and introducing a unique multi-task learning objective, we have significantly improved the model performance on STS tasks, which outperforms the capabilities of existing multilingual models in both target language understanding and cross-lingual evaluation tasks. Moreover, our bilingual models are more efficient, requiring fewer parameters and less memory due to their smaller vocabulary needs. Furthermore, we have expanded the Massive Text Embedding Benchmark (MTEB) to include benchmarks for German and Spanish embedding models. This integration aims to stimulate further research and advancement in text embedding technologies for these languages.

computational linguistic, dataset, proceedings, (15 more...)

arXiv.org Artificial Intelligence

2402.17016

Country:

South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(14 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)

Add feedback